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Abstract 



The capacity of stationary additive Gaussian noise channels with feedback is 
characterized as the solution to a variational problem. Toward this end, it is proved 
that the optimal feedback coding scheme is stationary. When specialized to the 
first-order autoregressive moving-average noise spectrum, this variational charac- 
terization yields a closed-form expression for the feedback capacity. In particular, 
this result shows that the celebrated Schalkwijk-Kailath coding scheme achieves 
the feedback capacity for the first-order autoregressive moving-average Gaussian 
CN ' channel, resolving a long-standing open problem studied by Butman, Schalkwijk- 

■ Tiernan, Wolfowitz, Ozarow, Ordentlich, Yang-Kavcic-Tatikonda, and others. 

o 

g '■ 1 Introduction and summary 

in 

We consider the additive Gaussian noise channel Y, = Y, + Z i} i = 1,2,..., where 
the additive Gaussian noise process {Z i } c *L 1 is stationary with Z n = (Zi,...,Z n ) ~ 

Y n (0, Kz, n ) for each n — 1, 2, We wish to communicate a message W G {1, . . . , 2 nR } 

over the channel Y n = X n + Z n . For block length n, we specify a (2 nR , n) feedback code 
with codewords X n (W, Y™" 1 ) = (X 1 (W),X 2 (W, Yi), . . . ,X n (W, Y™ -1 )), W = l,...,2 nR , 
satisfying the average power constraint - Y^7=i ^XfiyV, Y % ~ 1 ) < P and decoding function 
W n : R n -»• {1, . . . , 2 nR }. The probability of error P e (n) is defined by P e (n) = Pr{lY n (Y n ) ^ 
W}, where the message W is uniformly distributed over {1,2,..., 2 nR } and is independent 
of Z n . We say that the rate R is achievable if there exists a sequence of (2 nR ,n) codes 
with P e (n) -> as n — > oo. The feedback capacity Cfb is defined as the supremum of all 
achievable rates. We also consider the case in which there is no feedback, corresponding 
to the codewords X n (W) = (Xi(W), . . . , X n (W)) independent of the previous channel 
outputs. We define the nonfeedback capacity C, or the capacity in short, in a manner 
similar to the feedback case. 

Shannon |]Q showed that the nonfeedback capacity is achieved by water-filling on the 
noise spectrum, which is arguably one of the most beautiful results in information theory. 
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More specifically, the capacity C of the additive Gaussian noise channel Yi = X{ + Zj, 
i = l,2,..., under the power constraint P, is given by 



f'l EiailS^i").^ (ft) 
C = | I 2 1 ° g S 2 (e») a? (1) 

where S z {e ld ) is the power spectral density of the stationary noise process {Zi\ c *L l and 
the water-level A is chosen to satisfy 

max{0,A-^(e^)}— . (2) 

Although and (J2J) give only a parametric characterization of the capacity C(X) under 
the power constraint P(X) for each parameter A > 0, this solution is considered to be 
simple and elegant enough to be called closed-form. 

For the case of feedback, no such elegant solution exists. Most notably, Cover and 
Pombra [2] characterized the n-block feedback capacity CpB,n for arbitrary time-varying 
Gaussian channels via the asymptotic equipartition property (AEP) for arbitrary non- 
stationary nonergodic Gaussian processes as 

_ 1. det(K v>n + (B n + I)K Zin (B n + I)') 1/n fQ , 
GFBn = max -log ■ ^ — ; — — - 1 , 3 

where the maximum is taken over all positive semidefinite matrices Kv, n an d ah strictly 
lower triangular matrices B n of sizes n x n satisfying tx{Ky n + B n K Zt n{B n )') < nP. 
Note that we can also recover the nonfeedback case by taking B n = 0. When special- 
ized to stationary noise processes, the Cover-Pombra characterization gives the feedback 
capacity as a limiting expression 

Cfb = lim C-FB.n 

n— >oo 

.. 1. det{K v>n +(B n + I)K Z!n {B n + I)')y n 

hm max -log ■ : — — — -r 1 , . 4 

~K v , nt Bn2 & det{K Zin y/ n v ; 



n— >oo 



Despite its generality, the Cover-Pombra formulation of the feedback capacity falls 
short of what we can call a closed- form solution. It is very difficult, if not impossible, to 
obtain an analytic expression for the optimal {Ky n , B*) in © for each n. Furthermore, 
the sequence of optimal {Ky n , S*}^ =1 is not necessarily consistent, that is, (K Vn , B*) 
is not necessarily a subblock of (K Vn+1 , B* +1 ). Hence the characterization (jSJ) in itself 
does not give much hint on the structure of optimal {K Vn , B*}^ =1 achieving Cfb,«, or 
more importantly, its limiting behavior. 

In this paper, we make one step forward by proving 

Theorem 1. The feedback capacity Cfb of the Gaussian channel Yi = Xj + Z i7 i = 
1,2, ... , under the power constraint P, is given by 

r 1 . S v (e t9 ) + + B(e w )\ 2 S z (e ie ) d6 
Cfb = sup / - log ■ 



S v (e i6 ),B(e ig ) 



2 to S z {e ie ) 2tt 



where S z (e ie ) is the power spectral density of the noise process {Z i \ ( ^ =l and the supremum 
is taken over all power spectral densities Sv(e ld ) > and strictly causal filters B(e ld ) = 
YlT=i°j e ^ e satisfying the power constraint ^ j^ n (Sv(e %e ) + \B(e l9 )\ 2 S z (e l9 )) dO < P. 



Roughly speaking, this characterization shows the asymptotic optimality of a stationary 
solution (Kv, n , B n ) in (J3J) and hence it can be viewed as a justification for interchange of 
the order of limit and maximum in . 

Since Theorem 1 gives a variational expression of the feedback capacity, it remains to 
characterize the optimal (S v (e ld ), B*(e t6 )). In this paper, we provide a sufficient condition 
for the optimal solution using elementary arguments. This result, when specialized to 
the first-order autoregressive (AR) noise spectrum Sz(e td ) = 1/|1 + f3e t8 \ 2 , —1</3<1, 
yields a closed-form solution for feedback capacity as Cfb = — logxo; where Xq is the 
unique positive root of the fourth-order polynomial 

P X 



This result positively answers the long-standing conjecture by Butman [HUE], Tiernan- 
Schalkwijk [SIH], and Wolfowitz [7]. In fact, we will obtain the feedback capacity formula 
for the first-order autoregressive moving average (ARMA) noise spectrum, generalizing 
the result in jS] and confirming a recent conjecture by Yang, Kavcic, and Tatikonda 

The rest of the paper is organized as follows. We prove Theorem 1 in the next section. 
In Section 3, we derive a sufficient condition for the optimal (Sy(e ld ), B*(e ld )) and apply 
this result to the first-order ARMA noise spectrum to obtain the closed-form feedback 
capacity. We also show that the Schalkwijk-Kailath-Butman coding scheme fW\ ITT) |3] 
achieves the feedback capacity of the first-order ARMA Gaussian channel. 



2 Proof of Theorem 1 

We start from the Cover-Pombra formulation of the n-block feedback capacity Cfb,™ in 
(JSJ). Tracing the development of Cover and Pombra |2j backwards, we express Cfb,™ as 

C FBn = max h(Y n ) - h(Z n ) = max I(V n ; Y n ) 

V n +B n Z n V n +B n Z" 

where the maximization is over all X n of the form X n = V n + B n Z n , resulting in 
Y n = V n + (I + B n )Z n , with strictly lower-triangular B n and multivariate Gaussian 
V n , independent of Z n , satisfying the power constraint E Yl^i X? < nP. 
Define 

~ fl, S v (e ie ) + + B(e ie )\ 2 S z (e ie ) dd 
C FB = sup / -log — — 

where Sz(e ld ) is the power spectral density of the noise process {Zi}^ and the supremum 
is taken over all power spectral densities Sv(e id ) > and strictly causal filters B(e l9 ) = 
Er=i b kC ike satisfying the power constraint f^{S v {e ie ) + \B(e i9 )\ 2 S z (e i9 )) d9 < 2ixP. In 
the light of Szego-Kolmogorov-Krein theorem, we can express Cfb also as 

Cfb = sup h(y) - h{Z) 

where the supremum is taken over all stationary Gaussian processes {X i } c *L_ oc of the 
form Xi = Vi + EfcLi bkZi^k where {V i \°l_ O0 is stationary and independent of {Z i }°l_ 00 
such that EXq < P. We will prove that Cfb = Cfb- 



n • 
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OO 
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We first show that CpB.n < Cfb for all n. Fix n and let (Ky n , B*) achieve CVb 
Consider a process {Vi} c *L_ 00 that is independent of {Z i }°2 = _ OQ and blockwise i.i.d. with 
V£i 1)n ~ iV re (0,^ in ), fc = 0,±1,±2,.... Define a process as A^+f = 

Vkn+i + -^n^fcn+i f° r a ^ ^- Similarly, let Yi — Xi + Zi, — oo < i < oo, be the corre- 
sponding output process through the stationary Gaussian channel. Note that Y^ti = 

VktX\ n Hl+K) Z kn+i n for a11 k - For each * = 0, 1, . . . , n-1, define a process {Vi{t)}f 
as = V t+i for all i and similarly define }£_«,, }£_«,, and K 

Note that = -Xj(t) + Zj(£) for all i and alH = 0, 1, . . . , n — 1, but X™(t) is not equal 
to ^"(t) + B*Z?(t) in general. 

From the independence of V™ and V^ l5 we can easily check that 

2Cfb,„ = + 

= I(y**; >f n ) 

= /i(>f n ) - h(Zl n ). 

By repeating the same argument, we get 

for all fc. Hence, for all m = 1, 2, . . . , and each t = 0, . . . , n — 1, we have 

CW < -(h{YT(t)) - h(Z?(t))) + e m 

= l(/ i (yr(*))-^r))+^ 

where e m absorbs the edge effect and vanishes uniformly in t as m — > oo. 

Now we introduce a random variable T uniform on {0, 1, . . . , n — 1} and independent 
of {Vi, X iy Yi, Z i }°^_ 00 . It is easy to check the followings: 

(I) {Vi(T),Xi(T),Yi(T), Zi(T)}~_ 00 is stationary with K t (T) = X;(T) + ^(T). 

(II) {Xi(T)}^l_ 00 satisfies the power constraint 

EXl{T) = E[E{Xl{T)\T)\ = Ur{K* V)n B* n K z>n {B* n )') < P. 



(III) {Vi(T)}Z-oo and {^(T)}^^ are orthogonal; i.e., EVi{T)Zj{T) = for all 

(IV) Although there is no linear relationship between {Xi(T)} and (Zj(T)}, {Xj(T)} 
still depends on {Zj(T)} in a strictly causal manner. More precisely, for all i < j, 

E{Xi(T) Zj (T) | Zt^ (T) ) = E(E{X i {T)Z j {T)\Zt^{T),T)\Z i ^{T)) 

= E(E(X l (T)|^(T),T)E(Z,(T)|^(T),T)|^(T)) 

= J E;(E(x l (T)|zr^(T),r) J E;(z J (T)|zri(r))|zri(r)) 
= s(x i (r)|z < _^(r))s(z i (r)|zfi(T)), 

and for all i, 

Var(X,(T) - Vi(T)\Zt£(T)) = £(Var(X,(T) - V t (T)\Z^(T),T)\Z^(T)) = 0. 



Finally, define {Vi, X iy Y iy Z i }'*L_ lx to be a jointly Gaussian stationary process with the 
same mean and autocorrelation as {Vi(T), Xi(T), Y^T), Z i {T)}°^_ 00 . It is easy to check 
that {Vi, Xi, Yi, Zi} also satisfies the properties (I)-(IV) and hence that {V,} and {Zi} 
are independent. It follows from these properties and the Gaussianity of {V iy X i} Yi, Z^} 
that there exists a sequence {bk}^ =1 such that Xi = Vi + YlkLi ^k^i-k- Thus we have 

C FB , n < -(h(Y?(T)\T) - h(Z?)) + e m 
m 

<-(h(Yr(T))-h(Zr))+e m 
m 

<-(h(Yn-HZr))+e m . 
m 



By letting m — > oo and using the definition of C FB , we obtain 

c FB , n < h{y) - h(z) < c FB . 



(5) 



For the other direction of the inequality, we use the notation C FB (P) and C FB ^ n (P) 
to stress the dependence on the power constraint P. Given e > 0, let {Xi = Vi + 
Y^k=i khZi-h}^-^ achieve C FB (P) — e under the power constraint P. The corresponding 
channel output is given as 



Y i = Vi + Z i + ^T t b k Z i . 



(6) 



k=i 



for all i = 0,±1,±2,.... 

Now, for each m = 1,2, 
in the following way: 

XAm) 



we define a single-sided nonstationary process {X i {m)} c *L l 



Ui + Vi + jy^z, 
Ui + Vi + J2k=i h kZi- k 



i—k 



i < m, 
i > m 



where U\,U2, ■ ■ ■ are i.i.d. ~ iV(0,e). Thus, Xi{m) depends causally on Z\ _1 for all 
i and m. Let {Y i {m)} r: ^ =l be the corresponding channel output Y^m) = Xi(m) + Zi, 
i — 1,2, ... , for each m — 1,2, ... . We can show that there exists an m* so that 



1 n 

lim - V EX?(m*) < P + 2e 

n— >oo n — ' 



and 



lim -hiYfim*)) > h(y) - e 

n— >oo 77, 



(7) 



where h(y) is the entropy rate of the stationary process defined in (JHJ). Consequently, 
for 77 sufficiently large, 



1 

- V EX?(m*) <n(P + 3e) 



n * 



and 



-(h{Y?(m*)) - h(Z?)) > C FB (P) - 2e. 



Therefore, we can conclude that 

C FB , n (P + 3e) >C FB (P)-2e 

for n sufficiently large. Finally, using continuity of Cfb(-P) = hnin^oo Cfb,h(-P) i n P, we 
let e — > to get Cfb(-P) > Cfb, which, combined with (J5J), implies that 

Cfb(P)=C'fb(P). 



3 Example: First-order ARMA noise spectrum 

With the ultimate goal of an explicit characterization of Cfb as a function of Sz and P, 
we wish to solve the optimization problem 



§B_ 

2tt 



maximize f* n log(S v (e w ) + |1 + B(e w )\ 2 S z (e w )) 
subject to B(e l6 ) strictly causal 

S v {e ie ) > 

f^S v (e i& ) + \B(e^Sz(e i6 )i<P 

Suppose that Sz{e ld ) is bounded away from zero. Then, under the change of variable 

S Y (e ie ) = S v (e ie ) + + B(e i9 )\ 2 S z (e i9 ), 

we rewrite (El) as 



(8) 



maximize log S Y (e 10 ) tt- 

subject to B(e' td ) strictly causal 

S Y {e w ) > \l + B(e w )\ 2 S z (e ie ) 

j: n S Y (e i6 ) - (B(e») + B(e~ i6 ) + l)S z (e i9 ) £ < P. 



(9) 



Take any v > 0, 0,0i G £oo, and 2 ,03 G £i such that 0(e* 6 ') > 0, log0 G Li, 0i(e ) 
z/ - 0(e'' e ) > 0, 

" >i(e i9 ) 2 (e ie ) 
V^) 3 (e ifl ) 



>r 0, 



and ^(e <0 ) := 2 (e iO ) + z/5 z (e^) G Li is anticausal. Since any feasible B(e ia ) and S Y {e lti ) 



J6 



i9y 



je> 



satisfy 

we have 
tr 



S Y (e ie ) 1 + B(e w ) 
1 + Bj^) Sz\e 10 ) 



h 0, 



S Y l + B 

1 + B 



01 02 

02 03 



>iSy + 2 (1 + B) + 2 (1 + 5) + > 0. 



From the fact that log 2 < x — 1 for all x > 0, we get the inequality 

log 5y < — log + 0Sy — 1 

= — log + uS Y — 0i Sy — 1 

< - log + vS Y + ^ 2 {l + B)+ 0^(1 + 5) + ^Sz 1 - 1- 



(10) 



Further, since A £ L\ is anticausal and S £ is strictly causal, £ L\ is strictly 
anticausal and 

J^ e )-Wn^ = f_W^)B{^) ^ = 0. (11) 
By integrating both sides of (fTUJ) . we get 

logSy< / -log^ + I/SV +^ 2 (l+5)+^(l + 5)+^ 3 5 z 1 - 1 

/7T 
-\ogcl) + v{{B + B+l)S z + P) + ij 2 {\ + B) + iT 2 {\ + B) + ^Sz 1 - I 

/7T 
- Iog0 + ^ 2 + ^ + V's^ 1 + u{S z + P)-1 + AB + AB 
-7T 

/7T 
_log0 + ^ 2 + ^ + ^ 3 S'-i + I /( J S' z + P)-l (12) 

where the second inequality follows from the power constraint in and the last equality 
follows from (fTTj) . 

Checking the equality conditions in (|12|). we find the following sufficient condition for 
the optimality of a specific (SV^e* 61 ), B(e ie )). 

Lemma 1. Suppose Sz(e id ) is bounded away from zero. Suppose B(e zd ) £ is strictly 
causal with 

\B(e w )\ 2 S z (e w ) — = P. (13) 



2n 



If there exists A > such that 



and that 



A < essinf 11 + B(e ie )\ 2 Szie 1 

6»e[-7r,7r) 



B(e id )S z (e i6 ) £ L x 



1 + B(e- ie ) 

is anticausal, then B(e te ) along with Sy{e ie ) = attains the feedback capacity. 

Now we turn our attention to the first-order autoregressive moving average noise 
spectrum Sz(z), defined by 



Sz(e l 



1 + ae l " 



l + pe 



to 



ae [-1,1], /? e (-1,1). (14) 



This spectral density corresponds to the stationary noise process defined by Zi +/3Z i _ 1 = 
Ui+aUi_i, where {Ui}fl_ 00 is a white Gaussian process with zero mean and unit variance. 
We find the feedback capacity of the first-order ARMA Gaussian channel in the following. 

Theorem 2. Suppose the noise process {Z i }°? =1 has the power spectral density Sz(z) 
defined in (j!4)l . Then, the feedback capacity Cfb of the Gaussian channel Y{ = X{ + 
Zi, 2 = 1,2,..., under the power constraint P , is given by 



Cfb = -logx 



where Xq is the unique positive root of the fourth- order polynomial 



Pj , = (i-^)(i + « a )« 

(1 + apx) z 



and 

' 1, (3>a, 
(3 < a. 



a = sgn(/3 - a) = j ^ 



Proof sketch. Without loss of generality, we assume that |a| < 1. The case |cu] = 1 can 
be handled by a simple perturbation argument. When \a\ < 1, Sz(e 1 ) is bounded away 
from zero, so that we can apply Lemma 

Here is the bare-bone summary of the proof: We will take the feedback filter of the 
form 

B(z) = l±ii.^_ (16) 

1 + az 1 — oxz 

where x G (0, 1) is an arbitrary parameter corresponding to each power constraint P G 
(0, oo) under the the choice of y = 2—^ • . Then, we can show that Biz) satisfies 

\ i J a ax l+apx ' \ ' 

the sufficient condition in Lemma ^ under the power constraint 

2c w^ r y 2 de y 2 



P= B(e")\'Sz(e 



2tt |1 - xe ie \ 2 2vr 1 - x 2 ' 

The rest of the proof is the actual implementation of this idea. We skip the details. □ 

Although the variational formulation of the feedback capacity (Theorem [TJ, along 
with the sufficient condition for the optimal solution (Lemma HJ, leads to the simple 
closed-form expression for the ARMA(l) feedback capacity (Theorem^, one might be 
still left with somewhat uneasy feeling, due mostly to the algebraic and indirect nature 
of the proof. Now we take a more constructive approach and interpret the properties of 
the optimal feedback filter B*. 

Consider the following coding scheme. Let V ~ N(0,1). Over the channel Yi = 
Xi + Zi, i = 1,2, ... , the transmitter initially sends X x — V and subsequently refines the 
receiver's knowledge by sending 

X n = (ax)-W(V-V v ^ 1 ) (17) 

where x is the unique positive root of (JT5j) and V n = E(V\Yi, . . . , Y n ) is the minimum 
mean-squared error estimate of V given the channel output up to time n. We will show 
that 

liminf-J(F;K)> Jlogfl 

n->oo n 2 \x z 

while 

1 n 

limsup-^A^ 2 < P, 

i=l 

which proves that the proposed coding scheme achieves the feedback capacity. 
Define, for n > 2, 

Y' = d n V + U n + (-a) n -\aU - f3Z ) 



where 



\ 1 + crax ) 



Then one can show that can be represented as a linear combination of Y\, . . . , Y n and 
hence that 

E(V - V n ) 2 <e(v-(^2 d k Y k 
Furthermore, we can check that 



A vm2 x 2 ' 



whence 



or equivalently, 



limsup - \ogE(V - V n ) 2 < log ( 



liminf-/(y ; y n ) > ^ log 

rwoo n 2 \x z 



On the other hand, for n > 2, 

px 2 = ar^-^v - 1>_!) 2 < x" 2 ^- 1 ^ ( y - ( J2 d * Y k 

which converges to 



'n-l 



,fc=2 




n— >oo 



x" 2 ^ 1 ) _ (1 + aax) 
EES" 0+^5 



lim — = — = r ' "T' ■ (x~ 2 — 1) = P. 



Hence, we have shown that the simple linear coding scheme (|17|) achieves the ARMA(l) 
feedback capacity. 

The coding scheme described above uses the minimum mean-square error decoding 
of the message V , or equivalently, the joint typicality decoding of the Gaussian random 
codeword V, based on the general asymptotic equipartition property of Gaussian pro- 
cesses shown by Cover and Pombra [21 Theorem 5]. Instead of the Gaussian codebook 
y, the transmitter initially sends a real number 9 which is chosen from some equally 
spaced signal constellation 6, say, = { — 1, —1 + 5, . . . , 1 — 5, 1}, 5 = 2/(2 nR — 1), and 
subsequently corrects the receiver's estimation error by sending 9 — 9 n (up to appropriate 
scaling as before) at time n, where 9 n is the minimum variance unbiased linear estimate 
of 9 given Y n ~ x . Now we can verify that the optimal maximum-likelihood decoding is 
equivalent to finding 9* G © that is closest to 9 n , which results in the error probability 



P e (n) < erfc ^coX- 2n /2 2nR 



where erfc(x) = exp(—t 2 )dt is the complementary error function and Co is a con- 

stant independent of n. This proves that the Schalkwijk-Kailath-Butman coding scheme 
achieves Cfb = — logx with doubly exponentially decaying error probability 



4 Concluding remarks 



Although it is still short of what we can call a closed-form solution in general, our 
variational characterization of Gaussian feedback capacity gives an exact analytic answer 
for a certain class of channels, as demonstrated in the example of the first-order ARMA 
Gaussian channel. Our development can be further extended in two directions. First, one 
can investigate properties of the optimal solution (S V ,B*). Without much surprise, one 
can show that feedback increases the capacity if and only if the noise spectrum is white. 
Furthermore, it can be shown that taking S v = does not incur any loss in maximizing 
the output entropy, resulting in a simpler maximin characterization of feedback capacity: 



C 



FB 



sup inf - log 

{H}{a k } 2 




1 



£ 

k=l 



a k e 



ike 



1 



5> 

k=l 



fee 



ikfi 




Sz(e 



i0\ dd 
2?r 



< P. 



where the supremum is taken over all satisfying | Ylt=i ^k e%ke 

Secondly, one can focus on the finite-order ARMA noise spectrum and show that the 
fc-dimensional generalization of Schalkwijk-Kailath-Butman coding scheme is optimal 
for the ARMA spectrum of order k. This confirms many conjectures based on numerical 
evidences, including the recent study by Yang, Kavcic, and Tatikonda jH]. These results 
will be reported separately in [T2j . 
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